Comparing Shortest Paths Lengths of Free and Proprietary Data for Effective Pedestrian Routing in Street Networks
نویسندگان
چکیده
Ubiquitous mobile devices, such as smartphones, led to an increased popularity of pedestrian related routing applications over the past few years. Since pedestrians typically aim to minimize their walking distance, especially within non-recreational and multi-modal trips, pedestrian routing systems will only then be fully utilized if they are able to find the correct shortest path 5 and thus help to avoid unnecessary detours. GPS-based car navigation systems have become standard already several years ago, which has lead to the availability of accurate street network data for car-based routing applications. However, pedestrian routing applications should consider pedestrian related network segments besides those utilized by motorized traffic, including footpaths, or pedestrian bridges. This paper performs a shortest path analysis of pedestrian routes 10 for cities in Germany and the US. More specifically, we compare for a set of 1000 randomly generated origin-destination pairs the lengths of pedestrian routes that are computed based on different freely available network sources, such as OpenStreetMap (OSM) and TIGER/Line data, and proprietary datasets, such as TomTom, NAVTEQ, and ATKIS. The results show that freely available data sources such as OpenStreetMap provide a relatively comprehensive option for 15 cities where commercial pedestrian data sets are not yet available. Zielstra and Hochmair page 3 of 19 INTRODUCTION When walking from a trip origin to a destination, a pedestrian typically tries to minimize a variety of costs (disutilities), including distance, travel time, obstacles, turns, danger, slope, complex intersections, or uncertainties along the way. There exists a large body of research that reflects and analyses the different criteria a pedestrian may consider when choosing a route from 5 among a variety of path alternatives (24, 9, 14, 17, 7, 21), how the importance of these criteria changes with the trip purpose (2), and how criteria are applied in combination with other transportation modes, i.e., within a multi-modal trip (20, 26, 4). In multi-modal trips, travelers aim to minimize the distance and time of the walking portion of their trip, where the observed average walking distances are affected by demographic 10 variables (income, education), characteristics of the pedestrian environment, such as topography and safety of intersections, or population density, among others (20, 27). A high percentage of pedestrian trips within the US is below one or 1/2 mile long and some increase in walking popularity has been reported (13). Walking is also the dominant mode to access public transit for short access distances and therefore correct modeling and consideration of the most sophisticated 15 datasets for pedestrians are important. The results of an Onboard Transit Survey by the Atlanta Regional Commission clearly showed that most of the respondents were walking to the transit station (72.4%) with a distance less than 1/8 mile (53%) (8). Another study from the Netherlands showed that for rail transport walking is the dominant mode of access up to about 1.2 km (0.75mi) at the home end, and up to 2.2 km (1.4 mi) at the activity end (22). 20 The relatively new development of mobile devices with GPS and Wi-Fi positioning capabilities allows now to run routing applications and other location-based services on these small devices (1, 19). To be useful for pedestrian navigation these services need to be built on graph network structures with the required spatial granularity and detail. This is because compared to car drivers, pedestrians use different connections and they are not bound to lanes or 25 to any other car navigation restriction (6). Shortest (or fastest) path is one of the most prominent route selection criteria in pedestrian navigation (9). It is oftentimes the default search mode in routing applications, besides alternative routes that are computed based on other optimization criteria (15). It is therefore of relevance to determine which commonly available street network data provide a useful basis for pedestrian routing. In this paper we analyze a variety of freely 30 available and proprietary network datasets regarding shortest path lengths for two US cities (Miami and San Francisco) and two German cities (Berlin and Munich). The availability of pedestrian segments and their effect on shortest route generation will be analyzed for OSM, ATKIS (German government data), TIGER/Line, TomTom (formerly known as TeleAtlas) and NAVTEQ street datasets. 35 Since the increase of complementary geospatial data sets available, the integration of multiple network datasets has been a major research focus among transportation engineers, geographers, and computer scientists. The datasets often originate from different sources and vary in spatial accuracy, level of generalization, and attributes, which requires different approaches on the combination (conflation) of datasets (25, 3, 5) to facilitate their combined use 40 in transportation applications. However, this paper does not consider conflation techniques and will focus on the analysis of the individual potential of widely used data sources on shortest path generation for pedestrians. The remainder of the paper is structured as follows: The following section gives an introduction to the commercial and free street datasets used in this study and reviews previous 45 findings on geodata comparison tests in the US and Germany. The next section provides Zielstra and Hochmair page 4 of 19 information about the applied analysis methods, followed by the results on the comparison of differences in path lengths found between different datasets. This is followed by a discussion of the results and aspects of future work. VOLUNTEERED GEOGRAPHIC INFORMATION AND PROPRIETARY STREET 5 DATASETS Previous Work The high demand for freely available spatial data within recent years has boosted the availability of Volunteered Geographic Information (VGI) (10) on the Internet. The development of Web 10 2.0, the Global Positioning System (GPS) and its integration into cell phones, photo cameras, and other mobile devices, allows the Web community to interact with each other, provide information to central sites, and thus become a significant source of geographic information. The OpenStreetMap (OSM) project (available at openstreetmap.org), has the goal to create a detailed map of the world based on VGI in vector data format. The information is collected by many 15 participants, collated on a central database and distributed in multiple digital formats through the World Wide Web. It provides free downloads of spatial data from a large selection of themes, including roads, transit lines, tourist sites, or land use layers. Whereas most commercial providers of street data focus primarily on car navigation, OSM data also include pedestrian and bicycle paths, which makes them a potential rich data source used for off-street routing. Within 20 recent years the collection of VGI around the world created a fast growing selection of free street data. In connection with this development the quality of geodata and especially VGI has become a major research area. The quality of OSM data for England, Germany and parts of Florida, in particular the aspects of completeness, positional accuracy, and attribute accuracy, have been analyzed in recent years (12, 29, 28). 25 For Germany, a comparison of coverage between TomTom and OSM data conducted for several periods during 2009 revealed the largest discrepancies for car navigation related data with total lengths being 43% smaller for OSM compared to TomTom datasets. This indicates that TomTom at that time specialized in car related data. For pedestrian navigation related data the total length differences reduced from 27% to almost 10% over time, providing further 30 evidence for the assumption that OSM specializes on smaller streets and alleys. The level of completeness of OSM data reveals some spatial pattern, which is a clear decrease in the completeness of the OSM dataset away from the metropolitan areas towards the surrounding rural areas. However, within more densely populated areas, OSM data offer more overall data than the commercial provider (29). 35 Additional proprietary datasets for Germany can be found at the German Federal Office of Cartography and Geodesy and their corresponding regional surveyor’s offices. ATKIS (Amtliches Topographisch-Kartographisches Informationssystem) includes several different formats and datasets for entire Germany; however charges will apply which depend on the size of the area and amount of requested datasets. ATKIS data has not been included in the before 40 mentioned analyses, thus there is no additional information available about comparison results. For our pedestrian specific analysis one ATKIS dataset has been added for the city of Berlin in this paper. The results of the comparison of coverage between NAVTEQ, TomTom and OSM in the United States, in particular the state of Florida, showed a different pattern. As opposed to the 45 OSM coverage pattern observed for England and Germany, OSM data coverage is generally Zielstra and Hochmair page 5 of 19 higher in rural areas than for urban areas, when compared to commercial datasets. The good coverage results for OSM in the rural areas are not a result of user contribution but primarily based on the TIGER/Line import, since TIGER/Line contains more data especially in agricultural areas (small ways) than the commercial data providers. Despite the lower coverage of OSM in urban areas compared to commercial datasets, comparison between OSM and TIGER 5 data reveal that for some urban areas OSM data are actively collected by the Web community (29). Since the computation of shortest paths for pedestrian routing is based on the underlying street network, it is of interest to compare the suitability of various data sets for shortest path computations. It needs to be noted that we did not check any of the datasets for actual ground 10 truth, since test areas are too large to do so. However, the study provides a relative comparison of shortest path lengths for different datasets as a proxy for data completeness. Most of the tested datasets are currently used in routing applications and available to users and decision makers. CASE STUDY 15 For this study we computed shortest path distances of sample routes for four cities based on different street datasets. Different datasets were available for different cities (Error! Reference source not found.). From the commercial datasets, NAVTEQ Discover Cities data were available for Miami, while we had access to TomTom data for both US and German cities. In terms of free data OSM network data were available for all cities, while TIGER/Line is a US data standard 20 and therefore only available for the two US cities. Its counterpart for Germany, the ATKIS dataset, was provided to us for Berlin. While the test area of each German city was determined through official city boundaries included in the TomTom dataset, for the US cities, county boundaries for Miami-Dade County and San Francisco County were used instead. Test area sizes varied from 121 km for San 25 Francisco to 5040 km for Miami and from 892 km for Berlin to 310 km for Munich. Three of the four tested cities (Miami, Munich and Berlin) show a wide spread street pattern which is scattered through parks and rivers. San Francisco shows a more compact street pattern because of the limited city area size. Two major parks are located in the western part of the city. The public transportation network in cities can also influence the construction of pedestrian friendly 30 infrastructure and pedestrian travel behavior in many ways. While the German cities and San Francisco show a sophisticated public transportation network, Miami is lacking this kind of system when moving away from the CBD area. Extraction of Pedestrian Related Data To be able to compute shortest routes for pedestrians, all network datasets needed to be filtered, 35 which means that segments that were only accessible to cars, such as interstate highways, were removed. Due to different attributes and road classification schemes found in different datasets, the extraction methods had to be adapted. For example, for NAVTEQ data a query over street attributes formulated as [Pedestrian Access] = 'Y' was used to extract network segments that are either restricted to pedestrians only, or that can be accessed both by cars and pedestrians. For 40 TIGER/Line, street segments that prohibit pedestrian access were eliminated by the street name. OSM allows filtering the street network through specific tags that indicate accessibility for pedestrians. OpenRouteService, a routing application that runs on OSM data (available at openrouteservice.org), provides a wiki with a comprehensive list of street tags that can be used Zielstra and Hochmair page 6 of 19 for this purpose. TomTom and ATKIS provide a documented list of street attributes that can be used to filter pedestrian accessible network segments. Shortest Route Generation Before each dataset could be tested for shortest path lengths, random points were generated for 5 each urban area. These were then used as potential start and end points of shortest path routes. The potential position of each random point was restricted by a predefined buffer. This buffer was generated by creating five meter buffers around each street network (generated from the different data sources) and intersecting these buffers. This way it was guaranteed that with each dataset a given start or endpoint would be snapped to the correct street segment. 10 Next, a random set of 1000 origin-destination point pairs was extracted from the random points in a way that it resulted in network distances that approximate a typical distance decay function for walking trips (18). This way the random route set reflects typical pedestrian walking distances. A sample size of 1000 routes was chosen since it was found that they cover many parts of the networks, both less dense areas such as parks, as well as built areas. FIGURE 1 shows a 15 map with a sample of 1000 pedestrian routes generated for Berlin. A larger number of routes would result in overlapping tracks and violate independence of observations which was presumed for further statistical analysis. Some origin-destination point combinations caused a situation of non-connectivity, e.g. start point on island and end point on mainland, where no shortest route could be generated. Anytime this error occurred, the paired points and 20 corresponding routes were removed, which reduced the total number of origin-destination point pairs to 996 for Miami, 993 for San Francisco, 978 for Munich and 977 for Berlin. ESRI's ArcGIS 10 Geographic Information System was used for all analysis procedures in this paper. If noticeable differences in shortest path length can be found for a given point pair and two different underlying datasets, this can be attributed to two types of error, i.e., error of 25 omission and error of commission (FIGURE 2). There are several ways to identify these errors, such as use of aerial photographs or street view images. An error of omission occurs if the dataset is missing a segment that is crucial for computing a shortest route, while this segment exists in the real world. FIGURE 2a gives an example for an error of omission found in the Berlin network. TomTom, shown in dark blue color, is missing a segment in the street network, 30 in this case a pedestrian bridge leading across a major highway. This causes a detour for the pedestrian of more than 1000 meters (0.6 miles) compared to the correct shortest path. As can be seen, OSM, shown in green, contains the pedestrian bridge in the dataset and thus provides the (correct) shortest route in this area. However, not each additional segment found in a network dataset when compared to the remaining datasets can also be found in reality, and it would 35 therefore falsely reflect a network shortcut. Such a situation leads to an error of commission, as shown in FIGURE 2b for Miami. TIGER/Line data in cyan indicate that there are several pedestrian paths running in north-south direction. However, a current satellite image in the background proves that these paths are nonexistent. TomTom, shown in red, provides a more realistic dataset for this specific area. If data affected by this error of commission were not 40 removed before the distance analysis, results would incorrectly show that TIGER/Line provides shorter shortest paths than other datasets in this area. While errors of commission were rarely found in any of the datasets ─ in our case only Miami and Berlin showed any ─ they still needed to be identified during the analysis to give realistic 45 and representative results. The focus of this paper was on additional shortcuts provided by any of Zielstra and Hochmair page 7 of 19 the data sets where errors of commission would be misleading and favor those datasets with errors of commission. Each time one of the datasets showed an error of commission during the analysis and in comparison to routes found for other datasets, the erroneous dataset was corrected, the network re-built, and the test repeated. Errors of omission, however, allow a realistic assessment of a shortest pedestrian route capability between different networks. They 5 were not corrected before further analysis. It must be noted that some online route planners, such as Google Maps, provide walking directions. However, these route planners use proprietary network datasets which cannot be accessed and downloaded for analysis. This restriction makes it impossible to compute buffers around street segments (which are necessary for start and end point generation). Further it 10 complicates the analysis of errors of omission and commission. Pedestrian routes from online route planners were therefore not considered in this analysis, although it may be considered for future work. RESULTS 15 This section starts with an overview of the total length differences between different networks. Total lengths are based on approximately 1000 calculated shortest routes for pedestrians found in the four urban area networks. Differences are then analyzed for their statistical significance. The bar-charts in FIGURE show the total length of all calculated shortest routes based on different datasets used in each city. For the US cities differences between datasets were small. 20 In Miami the largest difference between two data providers was observed between OSM and NAVTEQ with 33 km or 3% with an average route length of 1019 m for OSM and 1050 m for NAVTEQ. NAVTEQ showed the largest overall length value in this city, indicating that for the specific routes that were tested NAVTEQ provided the fewest shortcuts for pedestrians. For San Francisco OSM and TomTom showed the largest difference between two data providers with 16 25 km or 1.8% and an average route length of 864 m for OSM and 880 m for TomTom. In German cities, the various datasets reveal larger differences in total shortest paths lengths than for US cities. While ATKIS, the purchasable federal dataset for Germany, provides significantly shorter pedestrian routes than the commercial TomTom dataset for Berlin, OSM provides significantly shorter paths than the other two datasets. Differences between OSM and 30 TomTom are as large as 112 km or 11 % in Berlin with an average route length of 899 m for OSM and 1013 m for TomTom. A similar pattern could be observed for Munich, where only OSM and TomTom were available. Differences between OSM and TomTom are 76 km or 8% with an average route length of 893 m for OSM and 971 m for TomTom. Previous research results showed large differences between OSM and TomTom in 35 German cities with regard to the amount of pedestrian related data in the network, e.g. footpath shortcuts or pedestrian bridges (28). This leads to shorter pedestrian routes, as shown in FIGURE c and FIGURE 3d and explains the observed difference in path lengths. For US cities previous analysis revealed only relatively small differences in pedestrian related segments (28), which causes smaller differences in shortest path lengths. However, the results in our analysis showed 40 that OSM, although not as much as in Germany, still provides the most complete free data source for pedestrian routes in both US cities as shown in FIGURE 3a and FIGURE 3b. Statistical Analysis A series of nonparametric Wilcoxon matched-pair signed rank tests were conducted to determine where the differences in median distances of shortest paths between different datasets were 45 Zielstra and Hochmair page 8 of 19 significant. A Student’s t-test could not be used because route lengths were not normally distributed. FIGURE a shows a Normal quantile plot (Q-Q plot) for one of the sample datasets which clearly indicates that the data is not normally distributed. Also, a logarithmic transformation on the route lengths did not lead to a normal distribution, but to a bi-modal distribution (FIGURE 4b), which makes the use of a t-test inappropriate. 5 Since more than two datasets were used and compared for each city, a Bonferroni correction was applied to the test. The Bonferroni correction is a common method to handle problems caused by multiple comparisons. Applying a test multiple times will cause the probability of rejecting a true null hypothesis (i.e., a Type I error) to be no longer equal to α (a significance level associated with the individual test). To retain an overall Type I error probability of α, a critical value can be 10 found by dividing α by the number of tests carried out (Bonferroni correction). FIGURE 5 gives an overview of the statistical results. By applying the Bonferroni correction α reduces to 0.05/3=0.017 in the case of Berlin since three combinations were tested. The α value has been left unchanged for Munich (only one comparison), and been adapted to 0.008 for Miami (six combinations) and 0.017 for San Francisco (three combinations). The significance 15 values in FIGURE 5 relate to individual tests. If a shown significance level is below the Bonferroni-corrected α-value, as specified before for each city, the difference in median path distance can be considered significant at a 5% level of significance. A Plus (+) sign indicates that the street network listed on the row header leads to significantly longer shortest paths than the street network denoted on the column header, while a Minus (─) sign indicates the opposite. For 20 example, in San Francisco, the OSM based network results in significantly shorter routes than the TIGER/Line network, which is indicated by the Minus sign in the first row. The statistical results indicate that in the case of Miami, where little OSM data has been collected after the TIGER/Line import, none of the median distances are significantly different 25 from one another. San Francisco, however, shows how active contribution to OSM can help to improve connectivity. More specifically statistical test results show that the median distances are significantly smaller for OSM than for TIGER/Line, which can be attributed to voluntary data collection efforts for OSM after the TIGER/Line import into the OSM database. However, length differences between OSM and TomTom in San Francisco were not significant at a 5 percent 30 level of significance. FIGURE c and FIGURE d indicate larger differences in total shortest path lengths between commercial and freely available datasets for German cities than for the US. For both German cities that were tested in our analysis the statistical results show significantly shorter distances for OSM in comparison to the TomTom and ATKIS data sets. 35 DISCUSSION AND FUTURE WORK The results of our analysis show that the length of shortest paths generated for pedestrians is affected by the completeness of the underlying dataset. Further it can be observed that some urban street networks do not provide a significant number of shortcuts (e.g. pedestrian bridges) that facilitate shortest routes for pedestrians 40 compared to the road network. In these areas, even if all existing pedestrian shortcuts were integrated in the network dataset, shortest path lengths would not differ much from shortest path lengths generated from a road network without pedestrian specific shortcuts. As opposed to this, especially for pedestrian oriented cities with plazas, pedestrian paths and alleys, such as Berlin and Munich, the choice of the dataset greatly affects the length of shortest paths. OSM data 45 Zielstra and Hochmair page 9 of 19 provide a free and relatively comprehensive alternative for cities where commercial pedestrian datasets are not yet available, especially if there are active participants contributing data, such as in Germany. A combination of both, freely available and commercial datasets would possibly provide the best coverage. However, it is important to note that a combined dataset including OSM and 5 other commercial datasets cannot be considered for implementation due to current licensing issues. OSM can be used freely under the terms of the Creative Commons Attribution-Share Alike 2.0 license which includes copying, distributing, adapting and transmitting the data. Commercial use of the data is not prohibited either, however, the user must attribute the work in the manner specified by OSM and, most importantly, the resulting work must be under the same 10 or similar license to the one used in OSM. The number of sample cities used in this study with commercial pedestrian street data is too small to make definite statements about shortest path differences between OSM and commercial pedestrian datasets. However, for all cities analyzed in the US and Germany, OSM data resulted in shorter shortest paths for pedestrians commercial data sets. This observation 15 indicates that the OSM community, also in the US, is active and helps to develop a comprehensive network of pedestrian paths, as can be particularly seen in the San Francisco area. For the individual traveler who uses a GPS enabled cell phone or handheld, inclusion of pedestrian related network segments will have the advantage of being provided with more realistic routes that avoid unnecessary detours compared to systems that use street bound data 20 only. While this study measured relative completeness of different network datasets by shortest path distances, actual routing applications will usually allow a search for alternative routes that are optimized for criteria other than travel distance, including safe routes, scenic routes, or routes with fewest slopes. In this case, additional edge attributes or turn cost need to be considered in the routing algorithm (16). Nevertheless, even if an alternative route, such as scenic route, is 25 requested by the user of a routing system, a more complete network will provide a better search result than a sparse one, simply by avoiding unnecessary detours. The integration of heterogeneous (free) datasets is still a challenge (23), and the assessment of VGI data quality, especially OSM data, is an ongoing issue of high importance for successful geo-applications (11, 12). 30 ACKNOWLEDGEMENTS The authors thank NAVTEQ and Tele Atlas for the generosity to provide us with sample datasets of the US street network. Alexander Zipf (University of Heidelberg, Germany) facilitated the use of TeleAtlas data for Germany for this study. 35 Zielstra and Hochmair page 10 of 19
منابع مشابه
Hardness of Finding Small Shortest Path Routing Conflicts
Nowadays most data networks use shortest path protocols such as OSPF or IS-IS to route traffic. Given administrative routing lengths for the links of a network, all data packets are sent along shortest paths with respect to these lengths from their source to their destination. One of the most fundamental problems in planning shortest path networks is to decide whether a given set of routing pat...
متن کاملOn the Hardness of Finding Small Shortest Path Routing Conflicts
Nowadays most data networks use shortest path protocols such as OSPF or IS-IS to route traffic. Given administrative routing lengths for the links of a network, all data packets are sent along shortest paths with respect to these lengths from their source to their destination. One of the most fundamental problems in planning shortest path networks is to decide whether a given set of routing pat...
متن کاملIs OSM Good Enough for Vehicle Routing? A Study Comparing Street Networks in Vienna
As a result of OpenStreetMap’s (OSM) openness and wide availability, there is increasing interest in using OSM street network data in routing applications. But due to the heterogeneous nature of Volunteered Geographic Information (VGI) in general and OSM in particular, there is no universally valid answer to questions about the quality of these data sources. In this paper we address the lack of...
متن کاملALGORITHMS FOR BIOBJECTIVE SHORTEST PATH PROBLEMS IN FUZZY NETWORKS
We consider biobjective shortest path problems in networks with fuzzy arc lengths. Considering the available studies for single objective shortest path problems in fuzzy networks, using a distance function for comparison of fuzzy numbers, we propose three approaches for solving the biobjective prob- lems. The rst and second approaches are extensions of the labeling method to solve the sing...
متن کاملInapproximability results for the inverse shortest paths problem with integer lengths and unique shortest paths
We study the complexity of two Inverse Shortest Paths (ISP) problems with integer arc lengths and the requirement for uniquely determined shortest paths. Given a collection of paths in a directed graph, the task is to find positive integer arc lengths such that the given paths are uniquely determined shortest paths between their respective terminals. The first problem seeks for arc lengths that...
متن کامل